protect vrule width0pt
Enhancing Software-Related Information Extraction via Single-Choice Question Answering with Large Language Models
Otto, Wolfgang, Upadhyaya, Sharmila, Dietze, Stefan
This paper describes our participation in the Shared Task on Software Mentions Disambiguation (SOMD), with a focus on improving relation extraction in scholarly texts through generative Large Language Models (LLMs) using single-choice question-answering. The methodology prioritises the use of in-context learning capabilities of LLMs to extract software-related entities and their descriptive attributes, such as distributive information. Our approach uses Retrieval-Augmented Generation (RAG) techniques and LLMs for Named Entity Recognition (NER) and Attributive NER to identify relationships between extracted software entities, providing a structured solution for analysing software citations in academic literature. The paper provides a detailed description of our approach, demonstrating how using LLMs in a single-choice QA paradigm can greatly enhance IE methodologies. Our participation in the SOMD shared task highlights the importance of precise software citation practices and showcases our system's ability to overcome the challenges of disambiguating and extracting relationships between software mentions. This sets the groundwork for future research and development in this field.
A Biomedical Knowledge Graph for Biomarker Discovery in Cancer
Karim, Md. Rezaul, Comet, Lina Molinas, Beyan, Oya, Rebholz-Schuhmann, Dietrich, Decker, Stefan
Structured and unstructured data and facts about drugs, genes, protein, viruses, and their mechanism are spread across a huge number of scientific articles. These articles are a large-scale knowledge source and can have a huge impact on disseminating knowledge about the mechanisms of certain biological processes. A domain-specific knowledge graph~(KG) is an explicit conceptualization of a specific subject-matter domain represented w.r.t semantically interrelated entities and relations. A KG can be constructed by integrating such facts and data and be used for data integration, exploration, and federated queries. However, exploration and querying large-scale KGs is tedious for certain groups of users due to a lack of knowledge about underlying data assets or semantic technologies. Such a KG will not only allow deducing new knowledge and question answering(QA) but also allows domain experts to explore. Since cross-disciplinary explanations are important for accurate diagnosis, it is important to query the KG to provide interactive explanations about learned biomarkers. Inspired by these, we construct a domain-specific KG, particularly for cancer-specific biomarker discovery. The KG is constructed by integrating cancer-related knowledge and facts from multiple sources. First, we construct a domain-specific ontology, which we call OncoNet Ontology (ONO). The ONO ontology is developed to enable semantic reasoning for verification of the predictions for relations between diseases and genes. The KG is then developed and enriched by harmonizing the ONO, additional metadata schemas, ontologies, controlled vocabularies, and additional concepts from external sources using a BERT-based information extraction method. BioBERT and SciBERT are finetuned with the selected articles crawled from PubMed. We listed down some queries and some examples of QA and deducing knowledge based on the KG.
Semantic CPPS in Industry 4.0
Fenza, Giuseppe, Gallo, Mariacristina, Loia, Vincenzo, Orciuoli, Domenico Marinoand Francesco, Volpe, Alberto
Cyber-Physical Systems (CPS) play a crucial role in the era of the 4thIndustrial Revolution. Recently, the application of the CPS to industrial manufacturing leads to a specialization of them referred as Cyber-Physical Production Systems (CPPS). Among other challenges, CPS and CPPS should be able to address interoperability issues, since one of their intrinsic requirement is the capability to interface and cooperate with other systems. On the other hand, to fully realize theIndustry 4.0 vision, it is required to address horizontal, vertical, and end-to-end integration enabling a complete awareness through the entire supply chain. In this context, Semantic Web standards and technologies may have a promising role to represent manufacturing knowledge in a machine-interpretable way for enabling communications among heterogeneous Industrial assets. This paper proposes an integration of Semantic Web models available at state of the art for implementing a5C architecture mainly targeted to collect and process semantic data stream in a way that would unlock the potentiality of data yield in a smart manufacturing environment. The analysis of key industrial ontologies and semantic technologies allows us to instantiate an example scenario for monitoring Overall Equipment Effectiveness(OEE). The solution uses the SOSA ontology for representing the semantic datastream. Then, C-SPARQL queries are defined for periodically carrying out useful KPIs to address the proposed aim.